NLP-enhanced Error Checking for Catalan Unrestricted Text

نویسندگان

  • Toni Badia
  • Angel Gil
  • Martí Quixal
  • Oriol Valentín
چکیده

We present here a general-purpose spell and grammar error detection architecture for Catalan unrestricted text. This architecture is based on a previous existing shallow morphosyntactic parser, which had to be adapted in order to successfully handle ill-formed input. The goal of this research is to obtain an architecture that can be used for developing morphosyntactic error checkers for both native and non-native speakers. We briefly present how we are currently customizing such an architecture in two different projects, as well as a means for annotating and exploiting error corpora (which ultimately condition the implementation of error checkers). We conclude with some remarks and future work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ALLES: Integrating NLP in ICALL Applications

This paper describes how mature NLP that has been successfully applied in the area of controlled language checking can be used to deliver intelligent CALL applications1. It describe how an autonomous, long-distance second-language learning system for advanced learners can be created. The architecture of the system consists of a multimodal user interface, a set of skill-specific learning tools, ...

متن کامل

A Survey of Spelling Error Detection and Correction Techniques

Spelling Correction is a process of detecting and sometimes providing suggestions for incorrectly spelled words in a text. Spell Checker is an application program that flags words in a document that may not be spelled correctly. Spell Checker may be stand-alone capable of operating on a block a text such as word processor, electronic dictionary. When some text is given as an input to spell chec...

متن کامل

CUCWeb: A Catalan Corpus Built From The Web

This paper presents CUCWeb, a 166 million word corpus for Catalan built by crawling the Web. The corpus has been annotated with NLP tools and made available to language users through a flexible web interface. The developed architecture is quite general, so that it can be used to create corpora for other languages.

متن کامل

Integrating Dictionary and Web N-grams for Chinese Spell Checking

Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Nevertheless, compared to spell checkers for alphabetical languages (e.g., English or French), Chinese spell checkers are more difficult to develop because there are no word boundaries in the Chinese writing system and errors may be caused by various ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004